Grapheme-to-Phoneme Models for (Almost) Any Language

نویسندگان

  • Aliya Deri
  • Kevin Knight
چکیده

Grapheme-to-phoneme (g2p) models are rarely available in low-resource languages, as the creation of training and evaluation data is expensive and time-consuming. We use Wiktionary to obtain more than 650k word-pronunciation pairs in more than 500 languages. We then develop phoneme and language distance metrics based on phonological and linguistic knowledge; applying those, we adapt g2p models for highresource languages to create models for related low-resource languages. We provide results for models for 229 adapted languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Thai grapheme based acoustic models into the ML-MIX framework - for language independent and cross-language ASR

Grapheme based speech recognition is a powerful tool for rapidly creating automatic speech recognition (ASR) systems in new languages. For purposes of language independent or cross language speech recognition it is necessary to identify similar models in the different languages involved. For phoneme based multilingual ASR systems this is usually achieved with the help of a language independent ...

متن کامل

Conversion from phoneme based to grapheme based acoustic models for speech recognition

This paper focuses on acoustic modeling in speech recognition. A novel approach how to build grapheme based acoustic models with conversion from existing phoneme based acoustic models is proposed. The grapheme based acoustic models are created as weighted sum from monophone acoustic models. The influence of particular monophone is determined with the phoneme to grapheme confusion matrix. Furthe...

متن کامل

Statistical Grapheme to Phoneme Conversion using Language Origin

This report describes a method for grapheme to phoneme conversion using statistical models of pronunciation. The available techniques for this conversion are first described and examples of each are given. A baseline system which uses Hidden Markov Models to represent phonemes in English is described and evaluated. The results from the baseline system serve to replicate previous research and to...

متن کامل

Investigations on joint-multigram models for grapheme-to-phoneme conversion

We present a fully data-driven, language independent way of building a grapheme-to-phoneme converter. We apply the joint-multigram approach to the alignment problem and use standard language modelling techniques to model transcription probabilities. We study model parameters, training procedures and effects of corpus size in detail. Experiments were conducted on English and German pronunciation...

متن کامل

Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016